Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catch + report BadZipFile errors in get_wheel_distribution #10535

Merged
merged 6 commits into from
Nov 12, 2021

Conversation

lukasjuhrich
Copy link
Contributor

@lukasjuhrich lukasjuhrich commented Oct 2, 2021

Change

Catch BadZipFile errors and re-raise with some context information.
No functional change, hence no news entry.

Motivation

I'm facing a few nasty CI bugs where the pip install command fails with an uncaught BadZipFile exception. A stacktrace looks like this:

  […N similar lines…]
   Building wheel for wrapt (setup.py): started
  Building wheel for wrapt (setup.py): finished with status 'done'
  Created wheel for wrapt: filename=wrapt-1.12.1-cp39-cp39-linux_x86_64.whl size=78277 sha256=337cd2663d837fd8364f46d472964afd4b05eb1fdf68361b26245a866e714227
  Stored in directory: /opt/pycroft/.cache/pip/wheels/98/23/68/efe259aaca055e93b08e74fbe512819c69a2155c11ba3c0f10
ERROR: Exception:
Traceback (most recent call last):
  File "/opt/pycroft/venv/lib/python3.9/site-packages/pip/_internal/cli/base_command.py", line 173, in _main
    status = self.run(options, args)
  File "/opt/pycroft/venv/lib/python3.9/site-packages/pip/_internal/cli/req_command.py", line 203, in wrapper
    return func(self, options, args)
  File "/opt/pycroft/venv/lib/python3.9/site-packages/pip/_internal/commands/install.py", line 342, in run
    _, build_failures = build(
  File "/opt/pycroft/venv/lib/python3.9/site-packages/pip/_internal/wheel_builder.py", line 336, in build
    wheel_file = _build_one(
  File "/opt/pycroft/venv/lib/python3.9/site-packages/pip/_internal/wheel_builder.py", line 228, in _build_one
    _verify_one(req, wheel_path)
  File "/opt/pycroft/venv/lib/python3.9/site-packages/pip/_internal/wheel_builder.py", line 177, in _verify_one
    dist = get_wheel_distribution(wheel_path, canonical_name)
  File "/opt/pycroft/venv/lib/python3.9/site-packages/pip/_internal/metadata/__init__.py", line 48, in get_wheel_distribution
    return Distribution.from_wheel(wheel_path, canonical_name)
  File "/opt/pycroft/venv/lib/python3.9/site-packages/pip/_internal/metadata/pkg_resources.py", line 43, in from_wheel
    with zipfile.ZipFile(path, allowZip64=True) as zf:
  File "/usr/local/lib/python3.9/zipfile.py", line 1257, in __init__
    self._RealGetContents()
  File "/usr/local/lib/python3.9/zipfile.py", line 1324, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

Since the information which file caused the error is missing, I don't have any means to further narrow down this issue (e.g. check which files are affected, compare checksums, etc). This change should help me and anyone else facing such issues.

NB

Please note that I have no clue about PIP internals; feel free to make any changes if e.g. the error message is factually incorrect.

@DiddiLeija
Copy link
Member

I don't know if this is necessary. The BadZipFile message looks reasonable to me.

@lukasjuhrich
Copy link
Contributor Author

lukasjuhrich commented Oct 2, 2021

@DiddiLeija Can you elaborate on your remark? As explained, the message is completely unhelpful to me because I don't even know what file or wheel it is about. It might be useful to the expert who knows the precise internals of what pip install does under the hood, but I'd expect it to come across as rather cryptic to anybody else.

(note also that the stacktrace is an excerpt; I have no reason to assume that the wrapt installation has anything to do with it)

@DiddiLeija
Copy link
Member

You're right, my misreading:

Since the information which file caused the error is missing, I don't have any means to further narrow down this issue (e.g. check which files are affected, compare checksums, etc). This change should help me and anyone else facing such issues.

Thinking a bit about this, this could be a good aproach, but we may have to sketch how the message must look like.

BTW, before going ahead, I recommend you to write a news entry, because this change is not trivial (I think). See https://pip.pypa.io/en/stable/development/contributing/#news-entries for more information. Also, you may have to write a test for this behavior.

@uranusjr
Copy link
Member

uranusjr commented Oct 2, 2021

Instead of raising RuntimeError, this should raise either UnsupportedWheel or a new exception subclassing InstallationError (e.g. InvalidWheel). InstallationError is a misnomer (legacy reasons), it basically fails the process with a nice error message instead of a bunch of traceback (the traceback is only shown in verbose mode).

@uranusjr
Copy link
Member

uranusjr commented Oct 2, 2021

I think it's also likely better to handle this in Distribution.from_wheel() instead and add a docstring describing the method should not raise BadZipError and should raise the new exception instead on an invalid wheel. A test that creates and tries to pip install an invalid zip file is needed as well.

@pradyunsg
Copy link
Member

#9964 is one example of what situations might cause this corruption.

@lukasjuhrich
Copy link
Contributor Author

Small addition: The error occurred with pip==21.2.4, and in the meanwhile, 135faab has introduced the abstraction layer of MemoryWheel / FileSystemWheel.

With that in mind, note how my stracktrace is slightly different as it would be now. It goes

  • get_wheel_distribution
  • Distribution.from_wheel
  • ZipFile.__init__

but now it would instead go

  • get_wheel_distribution
  • Distribution.from_wheel
  • FileSystemWheel.as_zipfile
  • ZipFile.__init__.

@lukasjuhrich
Copy link
Contributor Author

@DiddiLeija any pointers for which kind of test I should write? A unit test in unit.test_metadata which passes a mocked wheel which manually raises BadZipFile, or a full-on functional test where I try to load a corrupt wheel file, or something inbetween?

I'm not too familiar with pip's test concept.

@DiddiLeija
Copy link
Member

I think a functional test would be better in this case.

@lukasjuhrich
Copy link
Contributor Author

@uranusjr I followed your instructions. Also, I added a news file as suggested. A test is still missing.

I'm not sure whether InvalidWheel or CorruptWheel is preferable for an exception name.

Note that in the case of HTTP transfer, i.e. a call of dist_from_wheel_url, the wheel.location attribute will amount to the name of the temporary file created by LazyZipOverHTTP. The new message Wheel foo located at /tmp/blah won't be an improvement in this case, but it's probably not worse than an unhandled BadZipFile (and corruptions aren't likely to occur over HTTP, that's what the transport layer is for). But that's not the use case we want to improve here, anyway.

@DiddiLeija gotcha.

@uranusjr
Copy link
Member

Let's go with InvalidWheel, we can always change it if needed; pip doesn't have a public Python API so renaming things is not a big deal. Adding a new class means it'll be easier to do something to the lazy zip case if we want to, so let's get this in and think about how that can be improved afterwards.

@lukasjuhrich
Copy link
Contributor Author

Following functional.test_install_from_broken_wheel to run pip install $corrupt_file.whl, it's not quite clear to me how I should test for InvalidWheel to be thrown.

I see two options:

  1. somehow grepping through stderr
  2. parametrizing test_install_from_broken_wheel to feed it a wheel broken on the zip level as well. In both cases (BadZipFile vs. in whatever way brokenwheel-1.0-py2.py3-none-any.whl is corrupt) the observable result should be “stacktrace && installation unsuccessful”.

Writing a unit test for this behavior proved a lot simpler, so it sounds reasonable to me to go for option 2 and leave the unit test in, so we can test both the general behavior (“some stacktrace”) and the specific handling (InvalidWheel).

@uranusjr
Copy link
Member

I think option 2 makes more sense as well.

@lukasjuhrich
Copy link
Contributor Author

Done, and in addition

  • Incorporated black reformatting
  • verified that my unit test failed when undoing the change.

@lukasjuhrich
Copy link
Contributor Author

lukasjuhrich commented Oct 17, 2021

Sorry for all the force-pushes, I know it requires reapproval. I forgot to squash some stuff, I'm finished now.

news/10535.feature.rst Outdated Show resolved Hide resolved
Note that the functional test does not actually detect the behavioral
change of throwing unhandled `BadZipFile` → throwing unhandled
`InvalidWheel`, whereas the unit test does.
@uranusjr
Copy link
Member

pre-commit does not like trailing spaces.

@lukasjuhrich
Copy link
Contributor Author

pre-commit does not like trailing spaces.

Ah, I forgot to pre-commit install so I didn't notice that in this session. Does not complain locally, should be green now.

@lukasjuhrich
Copy link
Contributor Author

So sorry for the broken commit message in 6d0612d, fixed by add2ab0. Happened due to a reset so I could run pre-commit again.

@lukasjuhrich
Copy link
Contributor Author

Just checking in, is any further work required from my side?

@uranusjr
Copy link
Member

uranusjr commented Nov 9, 2021

Porbably not (at least not from me). I'll leave this open in case someone wants to take a look, but also add this to 22.0 so it can be merged if no-one has any objections.

@uranusjr uranusjr added this to the 22.0 milestone Nov 9, 2021
Copy link
Member

@DiddiLeija DiddiLeija left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my side, this PR is ready. Thanks @lukasjuhrich!

news/10535.feature.rst Outdated Show resolved Hide resolved
@pradyunsg pradyunsg merged commit e7c80c7 into pypa:main Nov 12, 2021
@pradyunsg
Copy link
Member

Thanks a lot for this @lukasjuhrich! ^>^

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 28, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants